Approaches to machine learning

نویسندگان

  • Pat Langley
  • Jaime G. Carbonell
چکیده

The field of machine learning strives to develop methods and techniques to automate the acquisition of new information, new skills, and new ways of organizing existing information. In this article, we review the major approaches to machine learning in symbolic domains, covering the tasks of learning concepts from examples, learning search methods, conceptual clustering, and language acquisition. We illustrate each of the basic approaches with paradigmatic examples. To appear in Journal of the American Society for Information Science. Correspondence should be addressed to Pat Langley, The Robotics Institute, Carnegie-Mellon University, Pittsburgh, Pennsylvania 15213. This work was supported in part by Contract N00014-82-C-50767 from the Office of Naval Research. MACHINE LKARNING PAGE 1 1 . Introduction: Why Machine Learning? Learning is ubiquitous in intelligence, and it is natural that Artificial Intelligence (Al), as the science of intelligent behavior, be centrally concerned with learning. ITierc are two clear reasons for this concern, one practical and one theoretical. With respect to the first, Al has now demonstrated the utility of expert systems, but these systems often require several man-years to construct An expert system consists of a symbolic reasoning engine plus a large domain-specific knowledge base. Kxpcrt systems that rival or surpass human performance at very narrowly defined tasks arc proliferating rapidly as Al is applied to new domains. A better understanding of learning methods would enable us to automate the acquisition of the domain-specific knowledge bases for new expert systems, and thus greatly speed the development of applied Al programs. On the theoretical side, expert systems arc unattractive because they lack the generality that science requires of its theories and explanations. On this dimension, the study of learning may reveal general principles that apply across many different domains. A third research goal is to emulate human learning mechanisms, and thus come to a better understanding of the cognitive processes that undcrly human knowledge and skill acquisition. In addition to improving our knowledge of human behavior, studying human learning may produce benefits for Al, since humans are the most flexible and robust (if slow) learning systems in existence. Hence, one objective of machine learning is to combine the capabilities of modern computers with the flexibility and resiliance of human cognition. As Simon [1] has pointed out if learning could be automated and the results of that learning transferred directly to other machines which could ftirther augment and refine the knowledge, one could accumulate expertise and wisdom in a way not possible by humans each individual person must learn all relevant knowledge without benefit of a direct copying process. Thus, no single mind can hold the collective knowledge of the species. 2. A Historical Sketch Historically, researchers have taken two approaches to machine learning. Numerical methods such as discriminant analysis have proven quite useful in perceptual domains, and have become associated with the paradigm known as Pattern Recognition. In contrast Artificial Intelligence researchers have concentrated on symbolic learning methods, which have proven useful in other domains. The symbolic approach to machine learning has received growing attention in recent years, and in this paper we review some of the main approaches that have been taken within this paradigm, and outline some of the work that remains to be done. Within the symbolic learning paradigm, work first focused on learning simple concepts from examples. This originally involved artificial tasks similar to questions found in intelligence tests given to children, such as "What do all these pictures have in common?" and "Does this new picture belong in the group?" Such tasks involve the formulation of some hypothesis that predicts which instances should be classified as examples of the concept Not too surprisingly, psychologists were among the active researchers in this early stage (e.g., Hunt Marin and Stone [3]). Subsequent work focused on learning progressively more complex concepts, often requiring larger numbers of exemplars. Recent work has focused on more complex learning tasks, in which the learner does not rely so heavily on a tutor for instruction. For example, some of this research has focused on learning in the context of problem solving, while others have explored methods for learning by observation and discovery. Learning by analogy with existing plans or concepts has also received considerable attention. In the following pages, we examine four categorical tasks that have been addressed in the machine learning literature learning from examples, learning search heuristics, learning by observation, and language acquisition. These four representative tasks do not by any means, cover all approaches to machine learning, but they should provide an illustrative sample of the issues, methods, and techniques of primary aamucl's [2] early checkers learning system was a notable exception to the later trend, relying mainly on a parameter fitting methods to improve performance. MACHINE LEARNING PAGE 2 concern to the field. In each case, we describe the task, consider the main approaches that have been employed, and identify some open problems in the area. As is typical in a survey article, we can only highlight the best known approaches and results in the area of machine learning, giving the reader a feeling for where the field as a whole has been and where it is heading. The serious reader is encouraged to digest other reviews of machine learning work by Mitchell [4], Dicttcrich and Michalski [5], and Michalski, Carboncll, and Mitchell [6]. Figure 1. Positive and negative instances of "arch". 3. Learning Concepts From Examples Methods for learning concepts from examples have received more attention than any other aspect of machine learning. The task appears straightforward: given a set of positive and negative instances of a concept, generate some rule or description that correctly identifies these and all future examples as instances or non-instances of the concept. However, despite its apparent simplicity, the approaches taken to solving this problem are nearly as numerous as the people"who have worked on it. Below, we consider one approach to learning from examples, and then examine some of the dimensions along which different approaches to this problem vary. After this, we discuss some open issues in learning from examples that remain to be addressed, 3 . 1 . An Example Perhaps the best known research on learning from examples is Winston's [7] work on the "arch" concept Figure 1 presents two examples of this concept and one counterexample that are very similar to those presented to Winston's system. Given these instances, one might conclude that "An ARCH consists of two vertical blocks and one horizontal block". This hypothesis covers both positive instances and excludes the negative one. Alternately, one could define "arch" as simply a union of all positive examples of ARCH ever encountered. However, the principles of brevity and generality preclude us from formulating such a definition, since we would like our concept to be as simple as possible, and for it'to be able to predict new positive and negative instances. Given the first hypothesis, there is hope that a simple and general definition of "arch" will converge and help us recognize future examples of arches. Now let us consider the two instances shown in Figure 2. Upon considering the positive instance, we realize that our concept of arch is too restrictive, since it excludes this instance. rhereforc, we revise the concept to "An ARCH consists of two vertical blocks and one horizontal object". However, this new hypothesis covers some of the negative instances, suggesting that it is overly general in some respect Revising the definition to exclude these instances, we might get: "An ARCH consists of two vertical blocks that do not touch and a horizontal object that rests atop both blocks. One can continue along these lines, gradually refining the concept to include all the positive but none of the negative examples. New positive instances that arc not covered by the current hypothesis (errors of omission) MACHINE LEARNING PAGE3 tell us that the concept being formulated is overly specific, while new negative examples that are covered by the hypothesis (errors of commission) tell us it is overly general. We have not been very specific about how the learner responds to these two situations, but we consider some of the alternatives below. All systems that learn from examples employ these two types of information, though we will see that they use them in quite different ways. Figure 2. Additional positive and negative examples of "arch". Lest the reader get the false impression that modifying an existing definition of a concept to accommodate a new positive or negative exemplar is always a simple process, we offer the positive and negative examples in Figure 3. We challenge the reader to devise an automated process that can modify "ARCH" to account for these examples. One insight that arises from these instances is that our concept of ARCH might involve some junctional aspects as well as the structural ones we have focused on so far. We shall have more to say on this matter later. 3.2. The Dimensions of Learning As Mitchell [4] and Dicttcrich and Michalski [5] have pointed out, all Al systems that learn from examples can be viewed as carrying out search through a space of possible concepts, represented as recognition rules or declarative descriptions. Moreover, this space is partially ordered along the dimension of generality, and it is natural to use this partial ordering to organize the search process. However, at this point the similarity between systems ends. The first dimension of variation relates to the direction of the search through the rule space. Discrimination-based concept learning programs begin with very general rules and make them more specific until all instances can be correctly classified, while generalization-based systems begin with very specific rules and make them more general. Since these two methods approach the goal concept from different directions and more than one concept may be consistent with the data, the two methods need not arrive at the same answer. Dietterich and Michalski have called the rules learned by discrimination systems discriminant descriptions, and the rules learned by generalization systems characteristic descriptions. In general, the latter will be more specific than the former. A second dimension of variation relates to the manner in which search through the rule space is controlled. Some systems carry out a depth-first search through the space of rules, while others employ a breadth-first search. In depth-first search, the learner focuses on one hypothesis at a time, generating more general or more specific versions of this (depending on the direction of the search) until it finds a description that accounts for the observed instances. In breadth-first search, the system considers a number of alternate hypotheses simultaneously, though many are eliminated as they fail to account for the data. Breadth-first search strategics have greater memory requirements than depth-first methods, but need never back up through the search space. A third dimension of variation involves the manner in which data is handled. All-at-once systems ni is this partial ordering that leads to branching, and thus to search. If the space were completely ordered, then the task of learning rules would be much simpler. MACHINE LEARNING PAGE 4 require all instances to be present at the outset of the learning process, while incremental systems deal with instances one at a time. The former tend to be more robust with respect to noise, while the latter are more plausible models of the human learning process. Finally, concept learning programs differ in the operators they use to move through the aile space. Data-driven systems incorporate instances in the generation of new hypotheses, while enumerative systems use some other source of knowledge to generate states, and employ data only to evaluate these states. Given these four dimensions, we can determine that 2 4 = 16 basic types of concept learning systems are possible, at least in principle. New researchers in machine learning might take as an exercise the task of classifying existing systems in terms of these dimensions, and brave individuals might attempt to develop a learning system that fills one of the unexplored combinations. In order to clarify the dimensions along which concept learning systems vary, let us examine two programs that lie at opposite ends of the spectrum on each dimension. For the sake of clarity, we will simplify certain aspects of the programs. The first is Quinlan's 1D3 system [8], which has been tested in the domain of chess endgames, where the concepts to be learned are "lost in one move", "lost in two moves" and so forth. The second is Hayes-Roth and McDcrmott's SPROUTKR [9] which has been tested on a number of complex relational instances like those in Figure 1 through 3. ID3 represents concepts in terms of discrimination networks, as with the disjunctive concept ((large and red) or (blue and circle and small)), shown in Figure 4. * ITic system begins with only die top node of a network, and grows its decision tree one branch at a time. For instance, the system would first create the (red or blue) branch emanating from the top node. Next, it would create a branch coming from one of the new nodes, if necessary. The tree is grown downward, until terminal nodes are reached which contain only positive or negative instances. Thus, the system can be viewed as discrimination-based, moving from very general rules to very specific ones. At each point, it must select one attribute as more discriminating than others, so it carries out a depth-first search through the space of rules. ID3 is given a list of potentially relevant attributes by the programmer, so that in deciding which branch to create, it uses the data only in evaluating these attributes. The system is thus enumerative rather than data-driven in its search through the rule space. Finally, the program has all data available at the outset, so that it can use statistical analyses to distinguish discriminating attributes from undiscriminating ones; as a result, ID3 is an all-at-once concept learning system rather than an incremental one. The exact evaluation function Quinlan uses to direct search is based on information theory, but Hunt, Marin, and Stone [3] have used another evaluation function, and the exact function seems to be less important than the overall search organization. Hayes-Roth and McDcrmott's SPROUTER [9] is historically interesting, since it was one of the first alternatives to Winston's early work on learning from examples. This program attempts to learn conjunctive Mitchell [4J has called these generate and test systems, while Dicttcrich and Michaiski [51 have called them model-driven systems. However. AI associates the first term with systems that proceed exhaustively through a list of alternatives, and associates the second term with systems that rely on large amounts of domain-specific knowledge. We prefer the term enumerative, since a learning system can enumerate a set of alternate hypotheses at each stage in its search, without being cither of these. + Figure 3. Still more positive and negative instances of "arch". MACHINE LEARNING PAGE 5 characteristic descriptions for a set of data, moving from a very specific initial hypodicsis based on the first positive instance to more general rules as more instances are gathered. Thus. Hayes-Roth and McDcrmott's concept learning system is generalization-based rather than discrimination-based. SPROUTKR also differs from ID3 in carrying out a breadth-first search through the rule space, rather than a depth-first search. With respect to positive instances, the system is data-driven, since it uses these instances to generate new hypotheses by finding common structures between them and the current hypotheses. However, the program is enwnerative with respect to negative instances, since it uses these only to eliminate overly general hypotheses. Similarly, SPROUTKR processes positive instances in an incremental fashion, reading them in one at a time and generalizing its hypotheses accordingly. However, it retains all negative instances in order to evaluate the resulting hypotheses, and processes them in an all-at-once manner. Thus, SPROUTKR is something of a hybrid system in that it treats positive and negative instances in quite different ways. Figure 4. A concept expressed as a discrimination network. 3.3. Open Problems in Learning from Examples A number of problems remain to be addressed with respect to learning from examples. Most of these relate to simplifying assumptions that have typically been made about the concept learning task. For instance, many researchers have assumed that no noise is present (i.e., all instances arc correctly classified). However, there arc many real-world situations in which no rule has perfect predictive power, and heuristic rules that are only usually correct must be employed. Some learning methods (such as Quinlan's) can be adapted to deal with noisy data sets, while others (such as Hayes-Roth and McDcrmott's) seem less adaptable. In any case, one direction for future work would be to identify those approaches that are robust with respect to noise, and to identify the reasons for their robustness. Most likely, tradeoff exist between an ability to deal with noise and the number of instances required for learning, but it would be uscftil to know the exact nature of such relationships. A related simplification is that the correct representation is known. If a learning system employs an incomplete or incorrect representation for its concepts, then it may be searching a rule space that does not contain the desired concept One approach is to construct as good a rule as possible with the representation given; any system that can deal with noise can handle incomplete representations in this manner. A more interesting approach is one in which the system may improve its representation. This is equivalent to changing the space of rules one is searching, and on the surface at least, appears to be a much more challenging problem. Little work has been done in this area, but Utgoff[10] and Lenat [11], have made an interesting start on the problem. MACHINE LEARNING PAGE 6 A final simplifying assumption that nearly all concept learning researchers have made is that the concept to be acquired is all or none. In other words, an instance eidier is an example of the concept or it is not; there is no middle ground. However, almost none of our everyday concepts arc like this. Some birds fit our bird stereotype better than others, and some chairs are nearer to the prototypical chair than others. (Is a Dodo a bird? Is a Platypus a better bird? If a person sits on a log, is it a chair? Is it a better chair if we add stubby legs and use a second log as a backrest?) Unfortunately, all of the existing concept learning systems rely fairly heavily on the sharp and unequivocal distinction between positive and negative instances, and it is not clear how they might be modified to deal with fuzzily-defined concepts such as birds and chairs. This is clearly a challenging direction for future research in machine learning. ITic vast majority of work on learning concepts frorp examples has assumed diat a number of instances must be available for successful learning to occur. However, recently a few machine learning researchers have taken a somewhat different approach. DeJong[12] has explored the use of causal information to determine die relevant features in a positive instance of a complex concept, such as kidnapping. By focusing on causal connections between events (such as the reason one would pay money to ensure another's safety), his system is able to formulate a plausible hypodicsis on the basis of a single positive instance and no negative instances. Winston [13] has taken a similar approach to learning concepts such as cup. His system is presented with a Junctional description of a cup (e.g., that it must be capable of containing liquid, that it must be capable of being grasped) and a single positive instance of the concept. The system then uses its knowledge of the world to decide which structural features of the example allow the functional features to be satisfied, again using causal reasoning. These structural features are used in formulating the definition of the concept. Both approaches rely on causal information, and both relate this to some form of Junctional knowledge. This new approach promises concept learning systems that arc much more efficient than the traditional syntactic methods, while retaining the generality of the earlier approaches. We expect to see much more work along these lines in the future. 4. Learning Search Methods One of the central insights of AI is that intelligence involves the ability to solve problems by searching the space of possible actions and possible solutions, and to employ knowledge to constrain that search. In fact, one of the major differences between novices and experts in a complex domain is that the former must search extensively, while the latter use domain-specific heuristics to achieve their goal. In order to understand the nature of these heuristics, and how they may be learned, we must recall that search involves states and operators. A problem is stated in terms of an initial state and a goal, and operators arc used to transform the initial state into one that satisfies the goal. Search arises when more than one operator can be applied to a given state, requiring consideration of the different alternatives. Of course, some constraints arc usually given in terms of the legal conditions under which each operator may apply, but these constraints arc seldom sufficient to eliminate search. In order to accomplish this, the learner must also acquire heuristic conditions on the operators. For example, Figure 5 presents a simple search tree involving two operators (Ol and 02), with the solution path shown in bold lines. If the problem solver knew the heuristic conditions on each operator, it would be able to generate the steps along the solution path without considering any of the other moves. The task of learning search methods involves determining these heuristic conditions. The problem of learning search heuristics from experience can be divided into three steps. First, the system must generate the behavior upon which learning is based. Second, it must distinguish good behavior from bad behavior, and decide which part of the performance system was responsible for each. In other words, it must assign credit and blame to its various parts. Finally, the system must be able to modify its performance so that behavior will improve in the future. Different learning programs can vary on each of these three dimensions. For instance, though their initial performance component will carry out search, it may use depth-first search, breadth-first search, means-ends analysis, or any one of many other methods for directing the search process. Below we consider some alternative approaches to dealing with credit assignment and modification of the performance system. MAG UNE LEARNING PAGE 7 Given this framework, the task of learning from examples is easily seen as a special case task of learning search heuristics, in which a single operator is involved and for which the solution path is but one step long. No true search control is necessary for the performance component, since feedback occurs as soon as a single "move" has been taken. Credit assignment is trivialized, since the responsible component is easily identified as die rule suggesting the "move". However, the modification problem remains significant, and in fact the task of learning from examples can be viewed as an artificial domain designed for studying the modification problem in isolation from other aspects of the learning process. In a similar fashion, the task of learning search heuristics can be seen as the general case of learning from examples, in which a different "concept" must be learned for each operator. Learning heuristics is considerably more difficult than learning from examples, since the learner must generate its own positive and negative instances, and since the credit assignment problem is nontrivial. Figure 5. A simple search tree. 4 . 1 . Assigning Credit and Blame As we have discussed, if a learning system is to improve its behavior, it must decide which components of its performance system arc responsible for desirable behavior, and which led to undesirable behavior. In general, assigning credit and blame can be difficult because many actions may be taken before knowledge of results is obtained, and any one of these actions may be responsible for the error. For instance, if the performance component is represented as a set of production rules, one must decide which of those rules led the system down an undesirable path. The problem of credit assignment is trivial in learning from examples since feedback is given as soon as a rule applies. However, the task is more formidable in the area of learning search heuristics, and recent progress in this area has resulted mainly from new insights about methods for assigning credit and blame. The most straightforward of these approaches relies on waiting until a complete solution path to some problem has been found. Since moves along the solution path led the system toward the goal, one can infer that every move on this path is a positive instance of the rule that proposed the move. Similarly, moves that lead one step off of the solution path arc likely candidates for negative instances of the rules that proposed them (though it is possible that alternate solutions starting with these moves were overlooked). Let us return to the problem space in Figure 5, with the solution path shown in bold. The move from state 1 to state 2 and from state 5 to state 6 would be classified as good instances of operator .Ol, while the move from state 2 to state 5 would be marked as a good instance of operator 02. In contrast, the moves from state 1 to state 3, and from state 5 to state 7 would be labeled as bad instances of 0 1 , while the moves from state 2 to 4, and from state 5 to 8 would be noted as bad instances of 02. Moves more than one step off the solution path (these are MACHINE LEARNING PAGE 8 not shown in the figure) are not classified; since they were not responsible for die initial step away from the goal they are not at fault. At least two recent strategy learning systems Mitchell Utgoff, and Bancrji's LEX and Langley's SAGK — have used this heuristic as their basic method for assinging credit and blame to components of their performance systems. Other systems, including firazdii's KLM [14] and Kiblcr and Porter's learning system [15], have used a similar technique, though their programs required the solution path to be provided by a benevolent tutor. Slccman, Langley, and Mitchell [16] have discussed the advantages of this method for "learning from solution paths". One limitation of this approach is that it encounters difficulty in domains involving very long solution paths and extensive problem spaces. Obviously, one cannot afford to search exhaustively in a domain such as chess. In response, some researchers have begun to examine other methods that assign credit and blame while the search process is still under way. These include such heuristics as noting loops and unnecessarily long patiis, noting dead ends, and noting failure to progress towards the goal. Systems that incorporate such "learning while doing" methods include Anzai's HAPS [17], Ohlsson's UPL [18], and I-anglcy's SAGK.2 [19]. Ironically, these systems have all been tested in simple puzzle-solving domains, where the "learning from solution paths" method is perfectly adequate. One obvious research project would involve applying these and other methods to more complex domains with long solutions and extensive search spaces. 4.2. Modifying the Performance System Once credit and blame has been assigned to the moves made during the search process, one can modify the performance system so that it prefers desirable moves to undesirable ones. If the performance component is stated as a set of condition-action rules, then one can employ the same methods used in learning from examples. In other words, one can search the space of conditions, looking for some combination that will predict all positive instances but none of thé negative instances. However, since multiple operators are involved, one must search a separate rule space for each operator. When one or more rules have been found for each operator, they can be used to direct search through the original problem space; if these rules are sufficiently specific, they will eliminate search entirely. However, the task of learning search heuristics does place some constraints on the modification method that is employed. In particular, the learning system must be able to generate both positive and negative instances of its operators. This poses no problem for discrimination-based learning systems, since they begin with overly general move-proposing rules that lead naturally to search. However, generalization-based systems are naturally conservative, preferring to make errors of omission rather than errors of commission. Such an approach works well if a tutor is present to provide positive and negative examples, but it encounters difficulties if a system must generate its own behavior. Ohlsson [18] has reported a mixed approach in which specific rules arc preferred, but very general move-proposing rules arc retained and used in cases where none of the specific rules are matched. However, in its pure form, generalization-based methods do not seem appropriate for heuristics learning. 4.3. Open Problems in Heuristics Learning We have seen that heuristics learning can be viewed as the general case of learning from examples, and many of the open problems in this area arc closely related to those for concept learning. For instance, one can imagine complex domains for which no perfect rules exist to direct the search process. In such cases, one might still be able to learn probabilistic rules that will lead search down the optimum path in most cases. This situation is closely related to the task of learning concepts from noisy data. Similarly, one can imagine attempting to learn search heuristics with an incorrect or incomplete representation. Finally, there are many domains in which some moves are better than others, but for which no absolute good or bad moves exist As with learning from examples, most of the existing heuristics learning systems assume that "all or none" rules ^ n Z V ^ S ^ b i d i r C C t i 0 n a l " * * M i t e h C , r s * * * — these can use the genexal MACHINE LEARNING PAGE 9 exist. Thus, even if one could modify the credit assignment methods to deal with such continuous classifications, it is not clear how one would alter the modification components of these systems. Kach of these problems have been largely ignored in the machine learning literature, but we expect to see more work on them in the future. One recent departure from the syntactic methods we described above corresponds closely with the causal reasoning approach to learning from examples. Rather than relying on multiple solution paths to learn the heuristic conditions on a set of operators, Mitchell, Utgoff, and Bancrji [20] have explored a method for gathering maximum information from a single solution path. This method involves reasoning backwards from the goal state, and determining which features of each previous state allowed the final operator in the sequence to apply. This method is used for each operator along the solution path, resulting in a macrooperator that is guaranteed to lead to the goal state. This method is very similar to that employed by Hikes, Hart, and Nilsson [21] in their early STRIPS system. Carbonell [22, 23] has explored a somewhat different but related approach in his work on problem solving by analogy. During its attempt to solve a problem, CarbonclLs system retains information not only about the operators it has applied, but about the reasons they were applied. Upon coming to a new problem, the system determines if similar reasons hold there, and if so, attempts to solve the current problem by analogy with the previous one. Both Mitchell's and CarbonclFs methods involve analyzing the solution path in order to take advantage of all the available information. As * with learning from examples, this approach to learning search heuristics has definite advantages over the more syntactic approaches, and we expect it to become more popular in die future. 5. Learning from Observation: Conceptual Clustering For the moment, let us return to the task of learning concepts from examples. Another of the simplifying assumptions made in this task is that the tutor provides the learner with explicit feedback by telling him whether an instance is an example of the concept to be learned. However, if we examine very young children, it is clear that they acquire concepts such as "dog" and "chair" long before they know the words for these classes. Similarly, scientists form classification schemes for animals, chemicals, and even galaxies with no one to guide them. Thus, it is clear that concept learning can occur without the presence of a benevolent tutor to provide feedback. The task of learning concepts in this way is sometimes called learning by observation. 5 . 1 . The Conceptual Clustering Task There are different types of learning by observation, but let us focus on what Michalski and Stcpp [24] have called conceptual clustering, since this bears an interesting relation to learning from examples. In the conceptual clustering paradigm, one is presented with a set of objects or observations, each having an associated set of features. The goal is to divide this set into classes and subclasses, with similar objects being placed together. The result is a taxonomic tree similar to those used in biology for classifying organisms. In fact, biologists and statisticians have developed methods for generating such taxonomies from a set of observations. However, these methods (such as cluster analysis and numerical taxonomy) allow only numeric attributes (e.g., length of tail), while the conceptual clustering task also allows symbolic features. Consider the set of objects shown in Figure 6, which vary on four binary attributes size, shape, color, and thickness of the border. Only four out of the sixteen possible objects are observed, and the task is to divide these into disjoint groups that cover the observed objects, but that do not predict any of the unobserved ones. The classification tree shown in the figure satisfies these constraints while reflecting the regularities in the data. For instance, size and shape are the only features Uiat are completely correlated, since all large objects are red, and all small objects are blue. Thus, these two features are ideal for dividing the observations into two groups at the highest level. However, within these groups finer distinctions can be made, and the features of border-thickness and shape are useful at this level. This example points out two additional complexities in the conceptual clustering task over learning from examples. First, classification schemes nearly always involve disjunctive classes, and any successful MACHINE LEARNING PAGE 10 method must be able to handle them. (A conjunctive clustering task would be one in which only a single object was observed, and would not be very interesting.) Second, concepts must be learned at multiple levels. For instance, in the above example the "concept" ((large and red) or (small and blue)) must be generated at the first level, while the concepts ((thick and square) or (thin and circle)) and ((thick and circle) or (thin and square)) must be learned at the second level. ITius, the task of conceptual clustering can be viewed as a version of learning from examples that is more difficult along a number of dimensions namely the absence of explicit feedback, the presence of disjuncts, and the need for concepts at multiple levels of description. large&white small&black thick&square thin&square

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine learning algorithms in air quality modeling

Modern studies in the field of environment science and engineering show that deterministic models struggle to capture the relationship between the concentration of atmospheric pollutants and their emission sources. The recent advances in statistical modeling based on machine learning approaches have emerged as solution to tackle these issues. It is a fact that, input variable type largely affec...

متن کامل

Time series forecasting of Bitcoin price based on ARIMA and machine learning approaches

Bitcoin as the current leader in cryptocurrencies is a new asset class receiving significant attention in the financial and investment community and presents an interesting time series prediction problem. In this paper, some forecasting models based on classical like ARIMA and machine learning approaches including Kriging, Artificial Neural Network (ANN), Bayesian method, Support Vector Machine...

متن کامل

Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches

DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...

متن کامل

Application of Machine Learning Approaches in Rainfall-Runoff Modeling (Case Study: Zayandeh_Rood Basin in Iran)

Run off resulted from rainfall is the main way of receiving water in most parts of the World. Therefore, prediction of runoff volume resulted from rainfall is getting more and more important in control, harvesting and management of surface water. In this research a number of machine learning and data mining methods including support vector machines, regression trees (CART algorithm), model tree...

متن کامل

Sports Result Prediction Based on Machine Learning and Computational Intelligence Approaches: A Survey

In the current world, sports produce considerable statistical information about each player, team, games, and seasons. Traditional sports science believed science to be owned by experts, coaches, team managers, and analyzers. However, sports organizations have recently realized the abundant science available in their data and sought to take advantage of that science through the use of data mini...

متن کامل

Stock Price Prediction using Machine Learning and Swarm Intelligence

Background and Objectives: Stock price prediction has become one of the interesting and also challenging topics for researchers in the past few years. Due to the non-linear nature of the time-series data of the stock prices, mathematical modeling approaches usually fail to yield acceptable results. Therefore, machine learning methods can be a promising solution to this problem. Methods: In this...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JASIS

دوره 35  شماره 

صفحات  -

تاریخ انتشار 1984